UNIVERSITY of WASHINGTON

# **Lecture 14: CMOS Scaling**

# Acknowledgements

All class materials (lectures, assignments, etc.) based on material prepared by Prof.
Visvesh S. Sathe, and reproduced with his permission



Visvesh S. Sathe
Associate Professor
Georgia Institute of
Technology
https://psylab.ece.gate
ch.edu

UW (2013-2022) GaTech (2022-present)

# Thank you!!

- Teaching this class has been a pleasure ©
- Feel free to reach out to my personal email: <u>diegopenac94@gmail.com</u>
- Best of luck on your careers! I'm sure you can all become great IC designers. Can't wait to hear of your accomplishments

#### Some resources

- "The nanosheet transistor is the next (and maybe last) step in Moore's law", Ye et al, IEEE Spectrum, 2019, <a href="https://canvas.uw.edu/files/99372084/">https://canvas.uw.edu/files/99372084/</a>
- "Nanoscale FinFET Technology for Circuit Designers", by Dr. Alvin Loke - Nov. 2021 (start watching at time 8:16): <a href="https://youtu.be/KdBJTqx4Y64?t=496">https://youtu.be/KdBJTqx4Y64?t=496</a>
- Weste & Harris: Section 7.4

#### More resources

#### Moore's law:

 Moore, Gordon E. "Cramming more components onto integrated circuits, Reprinted from Electronics, volume 38, number 8, April 19, 1965, pp. 114 ff." IEEE solid-state circuits society newsletter 11.3 (2006): 33-35.

#### Dennard's scaling:

R. H. Dennard, F. H. Gaensslen, H. -N. Yu, V. L. Rideout, E. Bassous and A. R. LeBlanc, "Design of ion-implanted MOSFET's with very small physical dimensions," in IEEE Journal of Solid-State Circuits, vol. 9, no. 5, pp. 256-268, Oct. 1974, doi: 10.1109/JSSC.1974.1050511.

### Where is this lecture going?

#### **Evolution of the FET**



Taken from "The nanosheet transistor is the next (and maybe last) step in Moore's law", Ye et al, IEEE Spectrum, 2019

## The March of Compute Performance



- 3 defining trends
  - More "stuff" per mm² (Transistors, Wires)
  - Faster transistors → Faster circuits
  - Technology shifts (Much less frequent: Bipolar→NMOS→CMOS →FinFET)

#### Remember this slide from lecture 1?

50 Years of Microprocessor Trend Data



Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2021 by K. Rupp

Taken from https://github.com/karlrupp/microprocessor-trend-data

## What happened with voltage scaling?



FIGURE 7.14

Voltage scaling with feature size

Taken from W & H

## What happened with voltage scaling?



FIGURE 7.14
Voltage scaling with feature size

Taken from W & H



Figure 1: Technology scaling trends of supply voltage and energy.

Taken from: "Near Threshold Computing: Overcoming Performance Degradation from Aggressive Voltage Scaling", Dreslinski et al, 2009

## What happened with voltage scaling?



stagnant V<sub>dd</sub> scaling 2.7X 2.7X 2.6X 1.7X 1.7X 1.7X 1.3X 1.1X 0 250 180 130 90 65 45 32 22 Technology node [nm]

Figure 1: Technology scaling trends of supply voltage and energy.

FIGURE 7.14

Voltage scaling with feature size

Taken from W & H RC C V6h - RC

Taken from: "Near Threshold Computing: Overcoming Performance Degradation from Aggressive Voltage Scaling", Dreslinski et al, 2009

Problem: Vth and leakage current

W

# **CMOS Scaling**

- Successor to NMOS back in the 1980s : Power Efficiency
- Moore's Law: Driving CMOS scaling over the decades...
  - Transistor densities on a chip 2X every 18\*\* months
  - A statement of fabrication capability, economics
  - More transistors  $\rightarrow \uparrow$  functionality, complexity  $\rightarrow \uparrow$  compute performance



<sup>\*\*</sup> Moore himself never claimed 18 months. The original observation was 2X every year and later 2X every 2 years.

# **CMOS Scaling**

- Feature-size shrink by  $\sim$ 30% every 18 months  $\rightarrow$  Smaller transistors
  - Cheaper transistors, more functionality/\$ \_ . O. 7 [~
  - Feature size **shrink** factor  $S=\sqrt{2}$
- ↑ transistors/chip does not completely account for ↑ performance
  - Recall that clock frequencies have climbed over the decades
  - Smaller transistors → Faster transistors (Until the last decade)
  - Note: Wires also had to shrink (This will become relevant shortly)
- Dennard's Law: Smaller transistors = Faster transistors
  - "Constant-field scaling": Maintain E-field in the transistor across nodes

| Parameter                              | Note                            | Dennard |
|----------------------------------------|---------------------------------|---------|
|                                        |                                 | Scaling |
| L: Length                              | Enabled by Fab.                 | 1/S     |
| W: Width                               | Enabled by Fab.                 | 1/S     |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S     |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| NA: substrate doping                   | Optional                        | S       |
| β                                      | Wε/(Lt <sub>ox</sub> )          | -       |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           |         |
| R: effective resistance                | $V_{DD}/I_{on}$                 | -       |
| C: gate capacitance                    | WL/t <sub>ox</sub>              |         |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   |         |
| f: clock frequency                     | 1/τ                             | -       |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   |         |
| gate                                   |                                 |         |
| P: switching power / gate              | Ef                              |         |
| A: area per gate                       | WL                              |         |
| Switching power density                | P/A                             |         |
| Switching current density              | I <sub>on</sub> /A              | -       |

- $S = \sqrt{2}$  Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard |
|----------------------------------------|---------------------------------|---------|
|                                        |                                 | Scaling |
| L: Length                              | Enabled by Fab.                 | 1/S     |
| W: Width                               | Enabled by Fab.                 | 1/S     |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S     |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| NA: substrate doping                   | Optional                        | S       |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S       |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           |         |
| R: effective resistance                | $V_{DD}/I_{on}$                 | -       |
| C: gate capacitance                    | WL/t <sub>ox</sub>              |         |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   |         |
| f: clock frequency                     | 1/τ                             | _       |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   |         |
| gate                                   |                                 |         |
| P: switching power / gate              | Ef                              |         |
| A: area per gate                       | WL                              |         |
| Switching power density                | P/A                             |         |
| Switching current density              | I <sub>on</sub> /A              | -       |

- $S = \sqrt{2}$  Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard |
|----------------------------------------|---------------------------------|---------|
|                                        |                                 | Scaling |
| L: Length                              | Enabled by Fab.                 | 1/S     |
| W: Width                               | Enabled by Fab.                 | 1/S     |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S     |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| NA: substrate doping                   | Optional                        | S       |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S       |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S     |
| R: effective resistance                | $V_{DD}/I_{on}$                 | -       |
| C: gate capacitance                    | WL/t <sub>ox</sub>              |         |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   |         |
| f: clock frequency                     | 1/τ                             | _       |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   |         |
| gate                                   |                                 |         |
| P: switching power / gate              | Ef                              |         |
| A: area per gate                       | WL                              |         |
| Switching power density                | P/A                             |         |
| Switching current density              | I <sub>on</sub> /A              |         |

- $S = \sqrt{2}$  Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard |
|----------------------------------------|---------------------------------|---------|
|                                        |                                 | Scaling |
| L: Length                              | Enabled by Fab.                 | 1/S     |
| W: Width                               | Enabled by Fab.                 | 1/S     |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S     |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| NA: substrate doping                   | Optional                        | S       |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S       |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S     |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1       |
| C: gate capacitance                    | WL/t <sub>ox</sub>              |         |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   |         |
| f: clock frequency                     | 1/τ                             | -       |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   |         |
| gate                                   |                                 |         |
| P: switching power / gate              | Ef                              |         |
| A: area per gate                       | WL                              |         |
| Switching power density                | P/A                             |         |
| Switching current density              | I <sub>on</sub> /A              | -       |

- $S = \sqrt{2}$ Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard<br>Scaling |
|----------------------------------------|---------------------------------|--------------------|
| L: Length                              | Enabled by Fab.                 | 1/S                |
| W: Width                               | Enabled by Fab.                 | 1/S                |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S                |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| NA: substrate doping                   | Optional                        | S                  |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S                  |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S                |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1                  |
| C: gate capacitance                    | WL/t <sub>ox</sub>              | 1/S                |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   |                    |
| f: clock frequency                     | 1/τ                             | _                  |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   |                    |
| gate                                   |                                 |                    |
| P: switching power / gate              | Ef                              |                    |
| A: area per gate                       | WL                              |                    |
| Switching power density                | P/A                             |                    |
| Switching current density              | I <sub>on</sub> /A              | _                  |

- $S = \sqrt{2}$ Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!



| Parameter                              | Note                            | Dennard |
|----------------------------------------|---------------------------------|---------|
|                                        |                                 | Scaling |
| L: Length                              | Enabled by Fab.                 | 1/S     |
| W: Width                               | Enabled by Fab.                 | 1/S     |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S     |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| NA: substrate doping                   | Optional                        | S       |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S       |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S     |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1       |
| C: gate capacitance                    | WL/t <sub>ox</sub>              | 1/S     |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   | 1/S     |
| f: clock frequency                     | 1/τ                             | -       |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   |         |
| gate                                   |                                 |         |
| P: switching power / gate              | Ef                              |         |
| A: area per gate                       | WL                              |         |
| Switching power density                | P/A                             |         |
| Switching current density              | I <sub>on</sub> /A              |         |

- $S = \sqrt{2}$ Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard |
|----------------------------------------|---------------------------------|---------|
|                                        |                                 | Scaling |
| L: Length                              | Enabled by Fab.                 | 1/S     |
| W: Width                               | Enabled by Fab.                 | 1/S     |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S     |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S     |
| NA: substrate doping                   | Optional                        | S       |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S       |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S     |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1       |
| C: gate capacitance                    | WL/t <sub>ox</sub>              | 1/S     |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   | 1/S     |
| f: clock frequency                     | 1/τ                             | S       |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   |         |
| gate                                   |                                 |         |
| P: switching power / gate              | Ef                              |         |
| A: area per gate                       | WL                              |         |
| Switching power density                | P/A                             |         |
| Switching current density              | I <sub>on</sub> /A              |         |

- $S = \sqrt{2}$ Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard          |
|----------------------------------------|---------------------------------|------------------|
|                                        |                                 | Scaling          |
| L: Length                              | Enabled by Fab.                 | 1/S              |
| W: Width                               | Enabled by Fab.                 | 1/S              |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S              |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S              |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S              |
| NA: substrate doping                   | Optional                        | S                |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S                |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S              |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1                |
| C: gate capacitance                    | WL/t <sub>ox</sub>              | 1/S              |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   | 1/S              |
| f: clock frequency                     | 1/τ                             | S                |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   | 1/S <sup>3</sup> |
| gate                                   |                                 |                  |
| P: switching power / gate              | Ef                              |                  |
| A: area per gate                       | WL                              |                  |
| Switching power density                | P/A                             |                  |
| Switching current density              | I <sub>on</sub> /A              | -                |

- $S = \sqrt{2}$  Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard<br>Scaling |
|----------------------------------------|---------------------------------|--------------------|
| L: Length                              | Enabled by Fab.                 | 1/S                |
| W: Width                               | Enabled by Fab.                 | 1/S                |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S                |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| NA: substrate doping                   | Optional                        | S                  |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S                  |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S                |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1                  |
| C: gate capacitance                    | WL/t <sub>ox</sub>              | 1/S                |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   | 1/S                |
| f: clock frequency                     | 1/τ                             | S                  |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   | 1/S <sup>3</sup>   |
| gate                                   |                                 |                    |
| P: switching power / gate              | Ef                              | 1/S <sup>2</sup>   |
| A: area per gate                       | WL                              |                    |
| Switching power density                | P/A                             |                    |
| Switching current density              | I <sub>on</sub> /A              | -                  |

- $S = \sqrt{2}$  Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard<br>Scaling |
|----------------------------------------|---------------------------------|--------------------|
| L: Length                              | Enabled by Fab.                 | 1/S                |
| W: Width                               | Enabled by Fab.                 | 1/S                |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S                |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| NA: substrate doping                   | Optional                        | S                  |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S                  |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S                |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1                  |
| C: gate capacitance                    | WL/t <sub>ox</sub>              | 1/S                |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   | 1/S                |
| f: clock frequency                     | 1/τ                             | S                  |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   | 1/S <sup>3</sup>   |
| gate                                   |                                 |                    |
| P: switching power / gate              | Ef                              | 1/S <sup>2</sup>   |
| A: area per gate                       | WL                              | 1/S <sup>2</sup>   |
| Switching power density                | P/A                             |                    |
| Switching current density              | I <sub>on</sub> /A              | -                  |

- $S = \sqrt{2}$  Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard<br>Scaling |
|----------------------------------------|---------------------------------|--------------------|
| L: Length                              | Enabled by Fab.                 | 1/S                |
| W: Width                               | Enabled by Fab.                 | 1/S                |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S                |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| NA: substrate doping                   | Optional                        | S                  |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S                  |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S                |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1                  |
| C: gate capacitance                    | WL/t <sub>ox</sub>              | 1/S                |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   | 1/S                |
| f: clock frequency                     | 1/τ                             | S                  |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   | 1/S <sup>3</sup>   |
| gate                                   |                                 |                    |
| P: switching power / gate              | Ef                              | 1/S <sup>2</sup>   |
| A: area per gate                       | WL                              | 1/S <sup>2</sup>   |
| Switching power density                | P/A                             | 1                  |
| Switching current density              | I <sub>on</sub> /A              | -                  |

- $S = \sqrt{2}$  Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

| Parameter                              | Note                            | Dennard<br>Scaling |
|----------------------------------------|---------------------------------|--------------------|
| L: Length                              | Enabled by Fab.                 | 1/S                |
| W: Width                               | Enabled by Fab.                 | 1/S                |
| t <sub>ox</sub> : gate oxide thickness | Enabled by Fab.                 | 1/S                |
| V <sub>DD</sub> : supply voltage       | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| V <sub>t</sub> : threshold voltage     | $E_{ox}=(V_{DD}-V_{th})/C_{ox}$ | 1/S                |
| NA: substrate doping                   | Optional                        | S                  |
| β                                      | Wε/(Lt <sub>ox</sub> )          | S                  |
| I <sub>on</sub> : ON current           | $\beta(V_{DD}-V_t)^2$           | 1/S                |
| R: effective resistance                | $V_{DD}/I_{on}$                 | 1                  |
| C: gate capacitance                    | WL/t <sub>ox</sub>              | 1/S                |
| τ: gate delay                          | $CV_{DD}/\beta(V_{DD}-V_t)^2$   | 1/S                |
| f: clock frequency                     | 1/τ                             | S                  |
| E: switching energy /                  | CV <sub>DD</sub> <sup>2</sup>   | 1/S <sup>3</sup>   |
| gate                                   |                                 |                    |
| P: switching power / gate              | Ef                              | 1/S <sup>2</sup>   |
| A: area per gate                       | WL                              | 1/S <sup>2</sup>   |
| Switching power density                | P/A                             | 1                  |
| Switching current density              | I <sub>on</sub> /A              | S                  |

- $S = \sqrt{2}$  Device current
- But so does V<sub>dd</sub>, C\*<sub>load</sub>
- Overall delay
- Power
- Power Density:
  - 1/S² area shrink makes compute cheaper, more powerful
  - BUT must be accompanied with 1/S<sup>2</sup> power savings!!!

#### Think about this at home

- Ring oscillator (Assume load presented only by devices):
  - Case1: Only Width of devices scales by 1/S
  - Case2: All tech. parameters scale (W, L, V, t<sub>ox</sub>, V<sub>th</sub>) all scale by a factor of 1/S.
  - Case3: Width and Length of devices scales by 1/S
  - What is change in delay, power, energy-per-cycle?
- FETs are all held at 1 micron after scaling to the new technology generation....what is the change in delay, power, energy..

# Interconnect Scaling (think about this at home)

| Parameter                                                  | Sensitivity                             | Scale Factor     |
|------------------------------------------------------------|-----------------------------------------|------------------|
| w: width                                                   |                                         | 1/S              |
| s: spacing                                                 |                                         | 1/S              |
| t: thickness*                                              |                                         | 1/S*             |
| h: height                                                  |                                         | 1/S              |
| D <sub>c</sub> : die size                                  |                                         | D <sub>c</sub>   |
| R <sub>w</sub> : wire resistance/unit length               | 1/wt                                    | S <sup>2</sup>   |
| C <sub>wf</sub> : fringing capacitance / unit length       | t/s                                     | 1                |
| C <sub>wp</sub> : parallel plate capacitance / unit length | w/h                                     | 1                |
| C <sub>w</sub> : total wire capacitance / unit length      | $C_{wf} + C_{wp}$                       | 1                |
| t <sub>wu</sub> : unrepeated RC delay / unit length        | $R_wC_w$                                | S <sup>2</sup>   |
| t <sub>wr</sub> : repeated RC delay / unit length          | sqrt(RCR <sub>w</sub> C <sub>w</sub> )* | sqrt(S)          |
| Crosstalk noise                                            | w/h                                     | 1                |
| E <sub>w</sub> : energy per bit / unit length              | $C_w V_{DD}^2$                          | 1/S <sup>2</sup> |

[Weste, Harris]

- Interconnect MUST scale along with transistors (Why?)
- \*Thickness: To scale or not to scale...
  - Scale it, and R↑ as S², RC as S².
  - Leave it unscaled, R个 S, but RC still as S<sup>2\*\*</sup>. Fabrication challenge

## Moore and Dennard Laws Part Ways...

| Parameter                              | Dennard Scaling  | Reality          |
|----------------------------------------|------------------|------------------|
| L: Length                              | 1/S              | 1/S              |
| W: Width                               | 1/S              | 1/S              |
| t <sub>ox</sub> : gate oxide thickness | 1/S              | ~1/S             |
| V <sub>DD</sub> : supply voltage       | 1/S              | 1                |
| V <sub>t</sub> : threshold voltage     | 1/S              | 1                |
| $\beta (\mu C_{ox}W/L)$                | S                | 1                |
| I <sub>on</sub> : ON current           | 1/S              | 1/S              |
| R: effective resistance                | 1                | 1                |
| Gate + Wire Capacitance                | 1/S              | 1< x <1/S        |
| τ: gate delay                          | 1/S              | 1< x <1/S        |
| f: clock frequency                     | S                | <b>&lt;</b> S    |
| E: switching energy /                  | 1/S <sup>3</sup> | >1/S             |
| gate                                   |                  |                  |
| P: switching power / gate              | 1/S <sup>2</sup> | >1/S             |
| A: area per gate                       | 1/S <sup>2</sup> | 1/S <sup>2</sup> |
| Switching power density                | 1                | <b>&gt;</b> S    |
| Switching current density              | S                | S                |

- Velocity saturation
- V<sub>th</sub> scaling ended
- V<sub>dd</sub> scaling ended
- t<sub>ox</sub> scaling limited by gate leakage

## **Dynamic Power**



[Intel]

- Patrick Gelsinger (now CEO of Intel) @ISSCC 2001
  - Current trend → Power density comparable to the Sun's surface by 2010
- That did not happen of course (Frequency flattened, Power Mgmt.)
- Energy efficient design emerged as a key limiter in early 2000s

# V<sub>th</sub> Scaling



- V<sub>th</sub> scaling worked well for the longest time
  - $P_{leak} = ke^{-Vth}$  but  $P_{leak} << P_{dynamic}$  it did not matter....till recently
- Gate leakage ( $e^{-tox}$ ) also began to  $\uparrow$ . Prevented  $t_{ox}$  reduction
- Inability to  $\downarrow$  V<sub>th</sub> crucial for improved device performance
- Contributed to the end of V<sub>dd</sub> scaling → the ensuing efficiency crisis

30

## (Relatively) Recent Developments

- Dennard's law essentially died ~ early 2000s
- Process technology kept Moore's law going
- Back End Technology advances
  - Copper Interconnect
  - Low-K Dielectrics (Lower wire capacitance)
  - Tapered Back-End stack (Global Interconnect)



- Strained Silicon (Enhance μ)
  - Dual Stress Liners
  - eSiGe
- High-K Dielectrics
- Metal gate
- Tri-gate MOSFETS











[Intel]

#### Device Architecture Evolution...



Source: Anandtech

#### Continued push:

- scaled geometries
- improved short-channel performance (How well does my gate control the FET)

## Device Architecture Evolution... (continued)

#### Evolution of the FET

#### The gate controls the flow of current through the channel region. Drain Dielectric Source Planar FET Up until about 2011, planar transistors were the best Charge can leak through the channel region and waste power. devices available.



**FinFET** 

Surrounding the channel region on three sides with the gate gives better control and prevents current leakage.



Stacked nanosheet FET

The gate completely surrounds the channel regions to give even better control than the FinFET.

#### How to make nanosheets?



A superlattice of silicon and silicon germanium are grown atop the silicon substrate.



A chemical that etches away silicon germanium reveals the silicon channel regions.



Atomic layer deposition builds a thin layer of dielectric on the silicon channels, including on the underside.



Atomic layer deposition builds the metal gate so that it completely surrounds the channel regions.

Taken from "The nanosheet transistor is the next (and maybe last) step in Moore's law", Ye et al, IEEE Spectrum, 2019

### An inverter using FinFETs



Taken from Lu, Anni, et al. "NeuroSim simulator for compute-in-memory hardware accelerator: validation and benchmark." Frontiers in Artificial Intelligence (2021): 70.

W

#### Nanosheet-based FET benefits





Stacked nanosheet FET
The gate completely surrounds the channel regions to give even better control than the FinFET.

- Improved short-channel properties (gate all around)
- More gate width flexibility (more sheet widths possible)